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Topological, on-the-fly classification of objects into a global set and 

local sets 



Technical field of the invention 

5 The present Invention relates to methods and apparatus for concurrently 

executing program threads in computer systems, and more particularly to the 
classification of objects into a global set and local sets, and the application thereof 
for detecting inconsistent dynamic concurrency state transitions, such as data races. 

1 0 Background of the invention 

In complex computer systems, multiple executions paths, or 'threads' can 
perform several tasks simultaneously (multi-threading). Each thread may then 
perform a different job, such as waiting for events to happen, or performing a time- 
consuming job that the progranri does not need to complete before going on. Multi- 

15 threading is used more and more, for exanrtple in FTP servers, background spelling 
checkers, parallel scientific calculations. Performing several tasks simultaneously 
can improve the execution speed by executing the threads on separate processors 
or it can improve the response time to events by suspending less time critical 
threads and allowing a more critical thread to react quickly to the event. In a multi- 

20 processor system, execution of threads can progress at different speeds depending 

pn, for example, different load, 

This often results in two or more threads simultaneously modifying a shared 
resource in a non-deterministic way; a situation often known as a data race. For 
example, a thread may read data from a certain address that can simultaneously be 

25 written to by one or more other threads. The actual data read depends on the order 
of reading and writing by the individual threads. The non-determinism resulting from 
such data races can cause the program to produce erroneous results. 

In Fig. 15, a simple example of a data race is shown. On the left, a thread T2 
accesses a common object A, and writes the value 5 to it This is followed by thread 

30 Ti accessing the same object A and writing the value 6 to it. The result of the 
operation is that object A contains the value 6. On the right, thread Ti executes 
faster which results in the same events happening, but in reverse order: first thread 
Ti accesses the object A and writes the value 6 to it, and then thread T2 accesses 
the object A and writes the value 5 to it. This results in the object A containing the 
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:Value 5. If it is. known where data races are likely to occur, synchronisation can be 
added by the programmer into the code to force a specific order. 

It is a problem to detect data race en-ors in multi-threading systems, because 
of two. reasons. First of all, they ..are non-deterministic. Even. if they are observed in 
5 one run, during a next run they may not occur again. This makes tracing of errors 
very, difficult or impossible. Secondly, they are non-local. One thread may be 
performing a spelling check and another may be editing the text being checked. 
These are two almost totally unrelated sections of code that, if not well 
synchronised, may cause problems. 

10 . To avoid data races, a programmer can force fragments of code running on 
different threads to execute in a scertain order by adding extra synchronisation 
between these threads. Hence, there is a need to know which parts of the code 
could be involved in a data race so that the appropriate, action can be. taken; 
Known techniques for checking for data, races are: : . , . . . 

15 - Static checking for data races onjhe source code, such as. described by Netzer, 
R.H.B., "Race condition detection, for debugging shared-memory parallel programs" 
(Ph.D. thesis,. University of WiscpnsjnrMadison), and in . US-5,822,588. Problems 
with this technique are that the interaction of threads varies dynamically while the 
threads are executing. Finding all data races through static.analysis is generally an 

20 NP complete problem. s 

- Post-mortem . analysis of the state of a .system in which an erroneous result was 
. determined.. An advantage of this technique^ is that only one execution of the 
program is being . analysed. Therefore, only data races that occurred during a 
; specific interaction of the threads, are considered and the search space for data 

25 races is reduced. A problem with this technique is that the occurrence of. data races 
is non-deterministic. This implies that it may take a very long time before a state of a 
.system can. be reached Jn which a data race, produces an erroneous result. 
Furtherniore, the state of such. a system is usually recorded at a poin^^ in the 
execution of the system long after the actual data race occurred.. It. is therefore very 

30 hard to backtrack to the original data race. ... 

. - Dynamic checking for data races during a particular execution (pn-the-fly analysis), 
as described in US-6,009,269. The. same advantages, as post-mortem analysis 
apply. An extra advantag is that data races are detected as they occur. So the 
problems of backtracking from a certain ppjnt in the execution back to the data race 

35 can be reduced. A problem with this known approach is that every operation on data 
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has to be observed. This results in a very large execution time overhead making 
dynamic detection of data races very time consuming and very intrusive compared 
to the original execution. 

Assured is a tool capable to detect, among other things, data races In Java 
programs. A short-coming of Assured is that, when two events race (so their vector 
clocks are parallel) but their threads do not actually overlap in time; no race is 
detected. 

Garbage collectors or better "incremental garbage collectors" are used for 
reclamation of storage or memory space during execution of a computer program. 
Popular programming languages such as C or C++ allow programmers to explicitly 
allocate and deallocate portions of memory. This requires careful programming. One 
way of solving this problem is to use a garbage collector. One problem with 
incremental garbage collectors with a programming language such as C is that the 
program can alter pointer references "behind the back of the garbage collector". 
This means that between activations (hence, the word "incremental") of the garbage 
collector, the references to objects has changed this can result in incorrect 
deallocation of memory. Various techniques are known to solve this problem. For 
example, US 6,055,612 describes a method of increasing the security of the 
memory decommit operation. However, the garbage collector still absorbs a large 
amount of processing time. Languages such as Java™ prohibit the use of explicit 
.deallocation by programs for which the garbage collector isxollecting. garbage.. The., 
problem with this solution is that legacy programs can not be upgraded. Also, a 
program may only require a small amount of memory to be freed but the garbage 
collection process takes a long time and frees more memory than currently required. 
Hence, there is a need for a garbage collector which has increased flexibility and 
spieed without sacrificing the security of rhemory deallocation. 

It is an object of the present invention to provide a method and apparatus for 
mechanisms for more efficient dynamic tracking of objects, in . multi-threaded 
computer programs.' 

It is a further object of the present invention to provide a method and 
apparatus or mechanisms for detecting inconsistent dynamic concurrency state 
transitions in the execution of multi-threaded programs, which reduces the time 
overhead involved. * 

It is a further object of the present Invention to provide improved compiler, 
interprietier and garbage collector mechanisms . 
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Summary of the invention 

The above objects are solved by a method-for classifying objects into a set of 
global objects and sets, of local objects, implemented in a computer system, 
5 whereby the classifying is done • dynamically by observing modifications to 
references to objects by operations performed in the computer system. 

Preferably, according to a method of the present invention, each object is 
provided with an instrumentation, data structure to enable observation of 
modifications to. references to objects. According to a preferred embodiment, this 
10 instrumentation data structure comprises at least a thread identification tag for 
identifying whether an object can be reached by only one thread or by more than 
one thread. / - • 

• In one embodiment , of the present invention .the v above, method is 
implemented as a cornputer implemented method-for detecting inconsistent dynamic 
15 concun-ency state transitions, especially data races in execution of multi-threaded 
programs which are amenable to object. reachability analysis. For example, strictly 
object oriented programs are anienable to object reachability analysis but the 
present invention is not limited thereto. With strictly object oriented is meant that the 
programming language has a strict notion of object, i.e. a reference or a handle is 
. 20 the only way to reach an object, pointers to an object are not used. An example is a 
. . program written^ JflijUhe^ Jaya^^ 

applied to programs written in other .languages which use pointers such as the C 
language for example. 

According to a method of the present invention, objects currently instantiated 
25 are classified in a set of global objects, for short the global set, containing objects 
that can be reached by multiple threads, and sets of local objects, for short the local 
. sets, containing objects that can onjy be reached by one thread. The global set and 
the Ipcalsets are subsets.oftlie total set^^^^^ updated during the 

program's execution. When an object is local, i.e. member of a local, set of a thread. 
30 it can never .be involved in a . data race. Only the global, set Is observed for 
■ determining occurrence .of inconsistent concurrency transitions such as data races, 
and these occunrences are reported. , 

Each local set is associated with exactly one thread. When ran object is 
created by a thread, this object is initially member of the local set associated with 
35 this thread. When an operation is performed that inserts a reference to a local object 
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into a global object, the local object is removed from its local set and stored In the 
global set, thereby becoming a global object. 

As a multi-threaded program executes, references to objects may be 
dropped. As such; a global object that was once reachable by multiple threads can 
5 * once again become reachable by one thread only. To detect this during execution of 
the program, all objects in the global set are analysed and possibly reassigned to a 
local set if the thread associated with this local set has exclusive access to the 
object. This reassignment can be performed at the programmer's discretion or 
automatically. By the combination of not checking operations on local objects for 
10 data races and reassigning objects to local sets, the execution time overhead of 
data race detection is reduced. ■ " 

The frequency of the above mentioned reassignment is subject to a trade-off. 
If the timei betweeH reassignments is increased, the number of objects in the global 
set that are in fact only reachable by one thread, increases accordingly. Therefore, 
15 the time to observe and analyse these global objects (as well as the memeory 
' required to store the results of the analysis) also increases. On the other hand, if the 
' tinfie between reassignments is made very short, the number of objects that are 
unnecessarily kept in the global set is small andiittle time is lost while observing and 
analysing the global objects. But this reassignment procedure absorbs processing 
' 20 time thus slowing down the operation overall. An optimum can be obtained between 
' ' . r the number or frequency of reassilgnm time for analysis:: ::'^ .Jf ^v^^l■.vU . .... 

In order to make race detection possible, each object created during 
execution of the multi-threaded program is provided with a special data structure. 
This data structure logs specific information about the thread. ' 
25 In a second aspect of the invention, a data structure, called an accordion 

clock, is maintained to determine whether two events can execute in parallel. An 
accordion clock is a refinement of a vector clock that takes into account the fact that 
threads are created and destroyed dynamically and adapts the dimension of the 
accordion clock in response thereto. ' 
30 The method of the present invention can be used as a debugging tool, and 

may be used to indicate potential' data race problems in a program. Based on a 
report of potential data races a programmer can then force fragments of code 
running on different threads to execute in a certain order by adding extra 
synchronisation between these threads. In other embodiments of the present 
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invention the method is implemented in a eompiler. in an interpreter and in a 
garbage collector.- 

The present invention also includes a computer system comprising: 
means for observing modifications to references to objects by operations perfomied 
in the computer system when executing multi-threaded programs; 
means for dynamically classifying the objects into a set of global objects, containing 
objects that can be reached by more than one thread, and a set of local objects, 
containing objects that can only be reached by one thread based on the output of 
the observing means. 

The present invention' also Includes a computer system for detecting 
inconsistent dynamic concurrency state transitions in the execution of multi-threaded 
programs amenable to object reachability analysis, comprising: j t= 
means for executing multiple threads on the computer system; ; ^ ■ ' - ^ ^ . 
means for at least periodically during- execution of the ^ threads 'classifying 
instantiated objects into a set of global objects (503; 1508), containing objects that 
can be reached by more than one thread, and a set of local objects (504; 1505, 
1506, 1507), containing objects that can only be reached by one thread, and 
means for recording in a memory concunrency state transition information of global 
objects. 

' :The present invention also includes a computer system for determining the 
^:..i.order.of. events Jn4he~presence-of.a:dynamiGally-changing' number of-threads of .a. 
computer program executable on the computer system having a memory, 
comprising: . ^ ^ ; • : • ; : . 

a clock data structure (601) maintained in memory, the dimension of the clock data 
structure (601) being determined dynamically dependent upon the number of 
threads created and destroyed during execution of the program; and 
means for. determining from the. clock data structure (601) the occurrence of two 
events in parallel. during executionof the threads... . . 

The present , invention also, ^includes a computer language compiler 
mechanism for converting a multi-threaded source program described by a program 
. language into, a computer executable machine language, for a computer system, 
comprising: . :. . 
means for receiving the source program; . . . . - . 

. means for analysing the source. program to produce object information; 
means for classifying the object information into a set of global objects, containing 
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objects that can be reached by more than one thread, and a set of local objects, 
containing objects that can only be reached by one thread, whereby the classifying 
is done dynamically by. observing modifications to references to objects required 
during the execution of the source program. 

The present invention also includes a computer language compiler 
mechanism for converting a multi-threaded source program described by a program 
language into a computer executable machine language for^a computer system 
having a memory, comprising: 
means for receiving the source program; 

means for determining the order of events in the presence of a dynamically 
changing number of. threads of the machine language program when executed on a 
computer, the order determining means comprising: 

a clock data structure (601) to be maintained in the memory, the dimension of the 
clock, data structure (601) being determined dynamically dependent upon the 
number of threads which would be created and destroyed during execution of the 
machine language program on the computer; and 

means for determining, from the clock data structure (601 ), the occurrence of two 
events .which would occur in : parallel :during > execution of the threads on. the 
computer. 

The form of the above compiler is not considered as a limitation on the 
. , present. . in veintion., .For;. example, : any^of the above ^compiler mechanisms^ may. be., ... 
implemented as conventional compilers, just-in-time or on-the-fly compilers, hybrid 
compilers. They may also be implemented as add-on programs to existing 
compilers. ' , - 

The present invention also includes a garbage collector mechanism for use 
in a computer system running a multi-threaded program, comprising: 
means for observing modifications to references to objects by operations performed 
Jn the computer system when . > executing, a, ^ multi-threaded program; 
means for dynamically classifying the objects into a set of global objects, containing 
objects, that can be reached by more than one thread, and a set of local objects, 
containing objects that can only be reached by one thread based on the output of 
the observing means; the garbage collector mechanism being adapted to selectably 
carry out garbage collection only on the set of local objects. 

The above garbage collector mechanism may be implemented as an integral 
part of a garbage collector or may be implemented as an add-on feature to an 
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existing garbage collector.. Typically, the garbage, collector will be implemented as 
an incremental garbage collector. 

The present invention also includes an interpreter mechanism for receiving a 
multi-threaded source, program written in a programming language and for 
5 outputting machine language instructions to a processing unit, comprising: 

means for observing modifications to references to objects by operations perfonned 
-when executing the multi-threaded program on the processing unit; 
means for dynamically classifying the objects into a set of global objects, containing 
objects that can be reached by more than one thread, and a set of local objects, 
10 containing objects that can only be reached by one thread based on the output of 
the observing means, 

' The present invention also includes an interpreter mechanism for receiving a 
multithreaded source program written, in a programming : language and for 
outputting machine language instructions; to a processing unit, comprising; 
15 means for at least periodically during execution of the .multi-threaded source 
program classifying instantiated objects into a set of global objects (503; 1508), 
. containing objects that can be reached, by .more than one..thread, and a set of local 
. . objects (504; 1505, 1506. .1507), containing objects that can only beTeached by one 
thread, and 

20 means for recording in a memory concurrency state transition information of global 

' The present invention also includes an interpreter mechanism for; receiving a 
multi-threaded source program written in a programming language and for 
outputting machine language instructions to a processing unit, comprising: « 
25 a clock data structure (601) maintained, in memory, the dimension of the clock data 
structure (601) being determined dynamically dependent upon the number of 
threads created and destroyed/ during execution of the . source program; and 
y.-,. means for determining fromJhe. clock data structure,'(601) the, occurrence of two 
events in parallel during execution of thojthreads. • 
30 ... Any of the above interpreter mechanisms may be. implemented as a virtual 
. machine. The interpreter mechanisms, according to the present invention may be 
included as an integral part of an interpreter or may be included as an add-on to an 
existing interpreter. 

The prejsent invention also includes a computer program product comprising: 
35 instruction means for observing, modifications to references to objects by operations 
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performed in the computer system when executing multi-threaded programs; and 
instruction means for dynamically classifying the objects into a set of global objects, 
containing objects that can be reached by more than one thread, and a set of local 
objects, containing objects that can only be reached by one thread based on the 
5 output of the observing means. 

The present invention also includes a computer program product for 
detecting inconsistent dynamic concurrency state transitions in the execution of 
multi-threaded programs amenable to object reachability analysis, comprising: 
instruction means for executing multiple threads on a computer system; 
10 instruction means for at least periodically during execution of the threads classifying 
instantiated objects into a set of global objects (503; 1508), containing objects that 
; can be reached by more than one thread, 'and a set of local objects (504; 1505, 
1 506, 1507), .containing objects that can only be reached by one thread, and 
instruction means for recording in a- memory concurrency state transition information 
15 of global objects. : - r---^ v : . ... . r . . 

The present invention also > ihciudes a computer program product ' for 
determining the order of events in the presence of a dynamically; changing number 
of threads of a computer program executable on the computer system having a * 
memory, comprising: 

> 20 instruction means for maintaining a clock data structure (601) in^ memory, the 

.dimension, of,., .the . clock . data ..structure . (60 1) being deteiniiined-^^^ , . 

t dependent upon the number of threads created and destroyed during execution of 
the progriam; and ' 

instruction means for determining from the clock data structure (601) the occurrence 
25 of two events in parallel during execution of the thireads. 

Any of the above computer programming products may be stored on suitable 
' data carriers such as hard discs, diskettes, CD-ROM's or any other suitable media. 

The computer program, product may also be downloaded via a suitable 

telecommunications network such as a Local Area Network; a Wide Area Network, 
30 the Internet, a telephone network. The present invention includes temporarily storing 

a part or whole of the computer program product at intermediate nodes of a 

telecommunications network, such as a data carrier network or a public telephone 

network. 

Other features and ' advantages of the - present invention will become 
35 apparent from the following detailed description, taken in conjunction with the 
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accompanying drawings, which illustrate, by way of example, the principles of the 
invention. r . 

The detailed description is given for the sake of example only, without 
limiting the scope of the Invention; The reference figures quoted below refer to the 
attached drawings: 



Brief description of the drawings 

Fig. 1 is a schematic overview of a method according to an embodiment of 
the present invention for detecting data races of multiple threads executing source 
10 programs. . . , , 

Fig. 2 is a schematic representation of a heap .constructed during an 
execution of class files. - ^ . - • - > ; t.c 

Fig. 3 is a schematic representation of an object; * ij . , ;:. 
- Fig. 4 is a schematic representation of extra instruraentationrof every object. 
15 Fig. 5 is a schematic representation of a division of a heap in global and local 

objects. ^ V : 

: - Fig. 6 is a schematic representation of an accordion clocic in accordance with 
an embodiment of the present invention. • -- .r •« : ' 

' Fig. 7 is a diagrammatic illustration of a sequential order of events in one 
.20 thread. . \ v-- . ; ' { 

i SLMavvfja n. KwFig. 8 iSf a/ diagrammaH using^ the* Thread • 

class. .. . ^ '"r; . 

Fig. 9 is a diagrammatic illustration of synchronisations through a locked 
object. . c. . : • . * ■ ' .• : ; 

25 ; . • Fig. 10 is a diagrammatic illustration of synchronisations through signals. 

Fig. 1 1 is a schematic representation of a thread information structure. 
. Fig. 1 Z is a schematic representation of a lock information structure. 
. %\ J.,. -! ..Fig. 13.J3 a schematic.overview of. a. subdivision. of. a heapj'ntoca^ plurality of 
local sets and a global set. . - . :. i 

• 30 •■ ■ • Fig. 14 is a schematic overview of the instrumentation of an object to perform 
- full data race detection according to the present invention. . 
Fig. 15 illustrates an example of a data race. 

Fig. 16 is a block diagram of a typical computer system in which the present 
, invention may be embodied. ^ . . . . i 
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Fig. 17 is a schematic representation of a comparison of the working of a. 
compiler (Fig. 17a) and an interpreter (Fig. 17b) 

Description of the illustrative embodiments 

5 The present invention will be described with respect to particular 

embodiments, for example written in the Java*^ programming language, and with 
reference to certain drawings but the invention is not limited thereto but only by the 
claims. Other programming languages may be used with the present invention, e.g. 
SmallTalk™, or any other strict object oriented programming language where a 
10 reference is the only way to reach an object. In addition, the technique can be 
applied to programs written in non-strictly object oriented languages as long as 
these can be analysed to established dynamically which threads can access parts of 
the data in the program and which threads cannot do this. For example, the present 
. invention may also be applied to programs written in a language such as C or C++. 
15 The preferred embodiments of the present invention are implemented on a 

computer system. In particular, the preferred embodiments of the method of the 
present invention comprise steps performed by a computer system executing a 
software program. - : 

Fig. 16 is a simplified block diagram of a computer system 910 which can be 
20 used for the computer system in which the method of the present invention may be 

i/ ; embodied; ThSi computer system^'iconfiguratlon illustrated at thist level- is general- v >. ^ 

purpose, and as such, Fig. 16 is labeled "Prior Art." A computer system such as 
system 910, suitably programmed to embody the present invention, however, is not 
prior art. The specific embodiments of the invention are embodied in a general- 
25 purpose computer system such as shown in Fig. 16, and the remaining description 
will generally assume this environment. 

In accordance with known practice, a computer system 910 includes at least 
. one. processor 9.12 that may communicate with a number of peripheral devices via a 
bus subsystem 915. These peripheral devices typically include a memory 
30 . subsystem 917, a user input facility 920, a display subsystem 922, output devices 
such as a printer 923, and a file storage system 925. Not all of these peripheral 
devices need to be included for all embodiments of the invention: 

, The term "bus subsystem" . is used generically so as to include any 
mechanism for letting the various components of the system communicate with each 
35 other as intended. The different components of the computer system 910 need not 
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be at the same physical location.- Thus, for example, portions of the file storage, 
system could be connected via various local-area or wide-area network media, 
including telephone lines. Similarly, the input devices and display need not be at the 
same location as the processor. 
5 Bus subsystem 915 is shown schematically as a single bus, but a typical 

system has a number of buses such as a local bus and one or more expansion 
.. buses (e.g., ADB. SCSI. ISA, EISA, MCA, NuBus, or PCI), as well as serial and 
parallel ports. Ethemet cards etc. Network connections are usually established 
through a device such as a network adapter on one of these expansion buses or a 
10 modem on a serial port. The computer system may be a desktop system or a 
portable system or an embedded controller: 

Memory subsystem 917 includes , a number of memories including a main 
random access memory ("RAM") 930 and a read only memory ("ROM") 932 in which 
fixed instructions are stored. In the case of Macintosh-compatible personal 
15 . computers' this would include portions of the operating system; in the case of IBM- 
compatible personal computers, this would include the BIOS (basic input/output 
, system). In some embodiments; DMA controller 931 . may be included. DMA 
controller 931 enables transfers from or to memory without going through processor 
> . ; 912.-. • ; . . • . • . v. ■ - 

.20 User . input facility 920 typically includes a user interface adapter .939: ;for 

' .7r^: ' /Connecting ra keyboard and/or a. pointing, device. 941 to .bus. subsystem .9/1 5..,The...w 
. pointing device 941 may be an indirect pointing device such as a mouse, trackball, 
. touchpad, or graphics tablet, or a direct pointing device such as a touch screen 
device incorporated into the display. . ^ 
25 Display subsystem 922 typically includes a display controller 943 for 

connecting a display device 944 to the bus subsystem 915. The display device 944 
may be a . cathode ray tube ("CRT"), a flat-panel device such as a liquid crystal 
...... .display .("LCD") or a .gas plasmarbasQd,flat-p.anel,d or a. projection device. The... . . 

display controller 943 provides control signals to the display device 944 and 
30 normally, includes a display memory. 945 ,for storing the . pixels that appear on the 
display device 944.. • , , 

. The file storage system 925 provides persistent (non-volatile) storage for 
program and jdata files, and includes, an I/O adapter 950 for connecting peripheral 
devices, such as disk and tape drives^ to the bus subsystem 915. The peripheral 
35 devices typically comprise at least one hard disk drive 946 and at least one floppy 
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disk drive ("diskette") 947;, One or more of the hard disk drives 946 may be in the 
form of a random array of independent disks ("RAID") system, while others may be 
more conventional disk drives. The hard disk drive 946 may include a cache 
memory subsystem 948 which includes fast memory to speed up transfers to and 
5 from the hard disk drive. There may also be other devices such as a CD-ROM drive 
949 and optical drives. Additionally.- the system may include hard drives of the typ 
with removable media cartridges. As noted above, one or more of the drives may be 
located at a remote location, such as in a server on a local area network or at a site 
on the Internet's Worid Wide Web. ■ : 

10 Those skilled in the art will appreciate that the hardware depicted, in Fig. 16 

may vary for specific applications; For example, other peripheral devices such as 
audio adapters may be utilised in addition of the hardware already depicted. Also 
^ other peripheral devices may be utilised in place of the hardware depicted. 

. JavaT^ is an object oriented language that was designed for writing multi- 

15 threaded applications. In Java™ there are only two fundamental data types: primitive 
types and reference types. Primitive types comprise booleans, integers, floating 
points, etc. Reference types comprise a reference to an object or contain 'nuir. 
These' objects are created dynamically on a type of memory know as a heap. In the 
usual implementation of Java™ a garbage collector is responsible for removing them 

20 when they are no longer referenced. Objects themselves can contain primitive types 
or-references-i » ••'•a...^ •.j*;.*^-.^'. i^.s,/.'',fv-..?v:. * • i:-...i...>-- v. '..t^fe . . 

• A race between two (or more) threads occurs when they modify a member 
variable of ah object in an unpredictable order. Races on variables on a stack in 
Java™ are impossible since the stack can only be manipuliated by the thread to 

25 which it belongs. ^ 

Fig. 1 is a schematic overview of a method 100 that can be used for 
detecting data races of multiple threads executing, e.g., Java™ source programs 
101 as defined by K. Arnold and J. Gosling in "The Java programming language" 
(Addlson-Wesley. 1996). 

30 A programmer or an automatic develojDment tool produces Java™ source 

code 101. The Java™ source code 101, once processed, is intended to execute 
concurrently in a computer system (CPU) 107 as described above. Such a computer 
system 107 may be a workstation, a personal computer or a main frame computer, 
'for example. Th computer system 107 comprises a memory and a processor, or 

35 ' multiple memories arid processors used in conjunction. In accordance with an 
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embodiment of the present invention, Java™ source code 101 can be compiled by a. 
compiler 102 into class files 103. i.e. a type that^defines the implementation of a 
particular l^ind of object, containing bytecodes as described by Tim Lindholm and 
. Frank Yellin in "The Java virtual machine specification" (Addison-Wesley. 1997). 

. 5 These class files 103 are then executed in the computer system 107 by means of an 
augmented interpreter 104 in accordance with an embodiment of the present 
, invention. This augmented interpreter 104 consists of a general byteoode interpreter 
110 augmented with a monitor. 111. The monitor may be an integral, part of the 
interpreter or may be an add-on to . a standard interpreter. The : augmented 

10 interpreter 104 is loaded into the system 107 by a loader 105.. The function of the 
monitor 111 is to produce a report 109 on concunrency state information conceming 
concurrently executing threads. Tlje report may contain infonmation on .data races 
occurring while the class files 103 are executing in the systenri 107. More 
specifically, when the class files 103 are being executed, ;four activities of the 

15 monitor 1 1 1 can be discerned, each of which is described in more detail hereinafter: 
1; Every object that is created by the interpreter 110 is instrumented to 
enable data race detectiori. This does not mean that every object is monitored. 

2. All forms of synchronisation present |n the program are analysed in order 
to find parts of code that are executing in, parallel. A logical order between events is 

20 established,..An event is classified in one of three classes: ejther ordered before or . 
^^,^^rt,^afterj^nother,ey^ ... 
be involved in a data race. A special data structure called an 'accordion clock' is 
described that is used to determine the. order of events, in the presence of a 
dynamically varying number of threads, 

25 3. A number of sets of objects are.maintalned (as can be seen in Fig, 5). One 

set 503 of global objects that are potentially reachable, by multiple threads is 
maintained. Furthemiore, for every thread, a. set 504 of local objects only reachable 
by this thread is mainta^^^^ present invention .only objects 

from the global set 503 will have to be analysed extensively to find inconsistent 

30 concurrency state transitions such, as data races. 

4, All bytecodes that read or write to a member variable are analysed to find 
inconsistent concurrency state transitions such as data races. For example, if two 
bytecodes modify a member variable of an object in the global set arid these 
bytecodes execute in parallel as indicated by their logical order, then, a data race is 

35 reported. 
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Instrum ntati n f Obj cts 

While the augmented interpreter 1 04 executes, a heap 201 of objects 202 is 
constructed (see Fig. 2). A heap 201 is an area of memory used for dynamic 
memory allocation where blocks of memory are allocated and freed in an arbitrary 
order. 

Three types of objects can be discerned: objects of type Class 203. array 
objects 204 and other objects 205, such as e.g. dates, linked lists, windows, 
scrollbars, sockets, i.e. all common objects used in a modern program. As 
represented in Fig. 3, an object 301 cbmjDrises code 302, and data 303. Data 303 
can be split into:' 

- references 304 to other objects called the children 307 of the object 301 , 

- other data 305 that can comprise boolean^, shorts, integers, etc, 

- a lock 306 that can be taken by a thread to gain exclusive access to some 
resource. - • - • 

According to the present invention, for every object 202, 301, 401, extra 
memory space is allocated for storing an instrumentation data structure 404, as can 
be seen in Fig. 4. The fields in this data structure 404 are: 

- Lock information address 405 (/ocWnfyAdc/r). This is a pointer that is used to attach 
a larger data structure, the lock information stmcture 409 (locklnfStructX when the 
object 401 is'beiriig lbbked fbrthe first time.'An object 401 ca^^^^ locked by a thread.- - 
Only one thread at a time' can obtain a lock. By holding the lock, the thread can 
exclude other threads from using the same shared resource. Once the activity for 
which the lock has been obtained is completed, the lock is released. If the lock is 
already held by another thread, the thread trying to obtain the lock is put onto a 
waiting list or in a w^it state. If the lock is released, one of the threads waiting on the 
waiting list is allowed to acquire the lock. The exact layout and the use of the lock 
information structure 409 is explained hereunder. At object creation; no lock 
inforrtiation structure 409 is attached yet to the lock information address 405. 

- Thread information address 406 {thrlhfAddr). Jh\s is a pointer where the program 
deals with an object of type Thread (or a subtype thereof) and not just with a general 
object. In this case, the thread information address 406 is the address of the thread 
information structure 410. Thread objects are Java's interface to the actual 
Executing thread, the exact layout and the use of the thread information structure 
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410 is explained hereunder. At object creation, no thread infonmation structure 410 
is attached yet to the thread information address 406. 

- Thread identification 407 (7/D). The thread identification 407 is used to record to 
which set, as described in Fig. 5, the object 401 belongs. An object 401 , 502 may 

5 belong to a global, set 503 or to a local set 504. Local sets 504 contain objects that 
can only be reached by one thread. The global set 503 contains objects that may be 
reachable by more than one thread. If an object 502 is member of the local set 504 
of a thread, the thread identification 407 contains the thread identification of this 
thread. If on the other hand, the object is member of the global set 503, the thread 
10 identification 407 contains a value that can never be assig|ned to a running thread 
(for example the value -1). The exact function of the thread identification 407 is 
explained hereunder. At object creation, the thread identification .407 is imtialised to 
the value of the thread that created the object 401 . 

- Object inifomiation address 408 {objlnfAddr). This is a pointer that is used to attach 
15 a larger data structure, the object information structure 411 {objInfStruct), in case 

• the program deals with an object 401 which is member of the global set 503. The 
exact layout and the use of the object information structure 411 is explained 
hereunder. At object creation, no object information structure 41 1 is attached yet to 
the object information address 408 except when an object of type Class is dealt 
20 with. 



Determining Logical Order 

The purpose of determining a logical order between events is to determine 
whether two events could have been perfonned in an unpredictable order. If two 
25 events are unordered and both events access shared data and at least one of the 
events modifies this shared data, then a data race occurs between these two events 
on the shared data. 

To avoid data races, a programmer can force fragments of code running on 
different threads to execute in ascertain order by adding extra synchronisation 
30 between these fragnrients. The fragments of code of a thread that are separated 
from each other by a synchronisation operation are commonly called events. The i"* 
event of thread Tt will be denoted in the present description by etj. A data race 
occurs when there is no set of synchronisations that force the events rnbdifying a 
shared variable to occur in a fixed order. 



35 
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Vector Clocks 

The present invention models the ordering of events by using a construct 
called a vector clock as defined by R. Schwarz and F. Mattern in "Detecting causal 
relationships in distributied computations: in search of the holy grale" (Distributed 
5 Computing, p.149-174, 1994) and by C.J. Fidge in "Partial orders for parallel 
debugging" (In Proceedings of the ACM SIGPLAN and SIGOPS Workshop on 
parallel and distributed debugging^ p.183-194, May 1988). 

Vector clocks are used in distributed systems to determine whether pairs of 
events are causally related. Timestamps are generated for each event in the 
10 system, and a causal relationship is determined by comparing these timestamps. 
Each process assigns a timestamp to each event. Vector clocks are tuples of 
' integers with a dimension equal to the maximum degree of parallelism (number of 
threads) in the application. In a system made up of n processes (n threads), each 
process keeps! a vector clock with n slots. Each integer value of a vector clock 
15 corresponds to a thread in the application and is called a scalar clock value of that 
thread. The first event, et,o, of every thread Tt is assigned the vector clock 



The value of the vector clock of a next event in a thread is calculated using 
the vector clocks of its preceding events. If event etj on thread Tt Is ordered after 
events E= {ei.bf ' :.. ei.n}i i ** " ' ' ^'"'^^[' - ■ 

[(max^)^ +1 ,y = / 

20 where (max£)| = max {e:E.VC(e)j} , denotes the component-wise maximum of the 
vector clocks of the events in E. . 

The most important property of vector clocks, for the purposes of the present 
invention, is that they can be used, to verify whether two events are ordered by a 
path of synchronisations. Two events, a and 6, are ordered if and only if 

25 If the thread identification numbers, / and y, of two different threads, T| and Tj, on 
which the events, a and 6, occurred, are known, then an important optimisation is 
possible, 

^ a^b^ VCia), <VC(b)^ ' 
Two events are parallel, i.e. not ordered, if and only If 
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a| 1 6 s -i(a 6) A -n(6 -> fl) ^ ; 

If the set of all. locations written to during event a is defined as W(a) and the set of all 
locations read during event a is defined as R(a) then two events, a and b, will be 

. involved in a data race if and only if 

|6) A .((fF(a)nif(6) V (i?(a)nff (6)^^^ 



5 Accordion Clocks 

Vector clocks have one . major drawback: for every new thread, a new 
position in the vector clock is needed. Hence, the dimensionality of a vector clock is 
the maximum number of threads created by a program. 

For FTP-servers, browsers, etc. which, for every new job that must be 
10 performed, dynamically create a new thread, this means that'the vector clocks grow 
excessively large. It is to be noted however that, for this type of applications, the 
number of threads that are concun-ently active, is usually much lower than the total 
number of threads created during the lifetime of the application. To exploit this, 
' 'accordion 'docks\ are ~ constnjcted In ' accordance with an embodiment of the 
15 invention that grow and shrink as the need requires. ' 

In Fig'. Q\ the data structure for abdbrdion clocks is represented. Accordion 
clock 601 comprises a lock 602 that can be' taken by a thread if exclusive access to 
the accordion clock 601 is required. Further, the address 603 of a local clock 604 is 
• ' present. This is the address of ther'data structure that actually contairls the accordion" 
20 clock data, the local clock 604 comprises a' lock field 605, a count field 606, an 
anray of values 607, a next field 608 and a previous field 609. 

A thread can lock the lock field 605 of the local clock 604 to obtain exclusive 
access to the local clock 604. 

The values of the local clock 604 are maintained in an an^y of values 607, 
25 This array 607 has the same function as a general vector clock (defined above) but 
' \s generally of a smaller-dimensionr The- local 'clock 604 can be shared among 
multiple accordion clocks 601 and implements copy-on-write semantics. The count 
field 606 indicates the number of different accordion clocks that use the local clock 
604. As soon as an accordion clock 601 requests a modification of the values 607 of 
30 its local clock 604, a new copy must be made of the local clock 604 and assigned to 
the accordion clock 601 making the request. The values 607 can then be updated. 
The count field 606 of the- new local clock 604 is assigned the value of l and the 
count field 606 of the old local clock is decremented by 1 . When the count field 606 
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drops to 0, the local clock's space can be reclaimed by the system. The next field 
608 and previous field 609 are used to link the local clock 604 in a doubly linked 
circular list 610. 

An additional global data structure, translation table 611 (tt). is maintained. 
5 This is an array that dynamically grows as threads are created. The length of the 
translation table 611 is equal to the total number of threads seen up till the current 
point in the execution of the program. The translation table 611 is used to indicate 
the position, tti, in the array of values 607 of the scalar clock of a thread Tj. 

Accordion clocks 601 are used as follows. When the program starts, only 

10 one thread is active. All accordion : clocks .601, aq, are created with length one. 

, .... , l(ac,) = l . . . 

, The translation table 611 is also of length one - 
with . . . . , , , . . . 

. ... , ,«,'=<), ,- . , . 

indicating that the scalar clock of. thread .number 0, Tp, is at position 0 in the values 
arrays 607 of ail the local clocks ,604. ..... 

15 When a new thread Js created. Tnew, the. translation table, tt, 61 1, is replaced 

by a copy, tr, with one extra position at the end- . 

All local clpcks 604 are enumerated through the linkecj list 610 and Jheir value arrays 
607, vai, are replaced t>y a copy, var, with one extra position at the end. At the extra 
. position the value zerp is stored: 

^/(va\) = /(v^) + l ■ ^ V V - 

20 , indicating that no synchronisation with the new thread, Tnew. occurred yet At the 
.. . new. position of the translation table..6i1, the .new position of the extended value 
. , ^arrays 607 Js stored: ' . 

, _ //y)., = /(va\)-l ^ . . 

indicating that this is the position where, the scalar clpcks of Tnew are stored in the 
values arrays .607. 

25 . , When an existing thread, Tow, goes out pf scope, which is explained later, the 
... position of its scalar clock, ttow, in the value anrays 607, va,, of the local clocks 604 is 
removed. This is done by creating a copy, va'„^ which is one position shorter 
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/(vfl',) = /(vfl,)-r 



(1) 

Similarly, a new copy (tt') of the translation table 61 1 must be created that reflects 
the fact that the positions of the scalar clocks have shifted. 

(2) 

How the size of the local clocks 604 and the size and content of the 
translation table 61 1 are adjusted in response to thread c'reatibri and destruction has 
been explained above. The use of the accordion clocks 601 as a drop-in- 
replacement for the behaviour of the vector clocks will now be described. 

If a'function VA(e), is deflned'to be the value array assigned to an event, e, 
then the first event, et.o, of every thread It is assigned an accordion dock 601 with a 
value array 607 

^fO J<We,,o))-l 

....... 

" ' 'The valtie ara 6D7 of ah^alScbVdidn d nexl eveht in a tfiread 

is calculated using the accordion clocks 601 of its preceding events. If event et,i On 
thread It is ordered after events E = {eo, en}, the value aray 607, VA(et,i) of the 

accordion clock 601 becomes 

[(max^)/' ,}V/r(0 



(4) 



where 



(max£)^. = max{e : £yi(e)y} 

denotes the component-wise maximum of the value an-ays 604 of the accordion 
clocks 601 in E. 

Comparison of two accordion clocks 601 remains the same as when using 
vector clocks. Two events, a and are ordered if and only if 
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a Z> s (Vj . VA(a\ < VAib),) a (3i . VA(a), < VAib\) 
If the thread identification numbers, / and y, of two different threads, T| and Tj, on 
which the events, a and b, occurred are known, then an important optimisation is 

again possible: 

a^b^VAia),,,,<VAib),,j, 

A final point that must be clarified is when a position, /. from the value array 
5 607 can be removed using the rules (1 ) and (2). This position, /, is the position of the 
scalar clock of the thread, Tj, with j the thread nurnber of the thread being removed. 
It can be removed if and only if two conditions are met. The first condition is that 
thread Tj must have finished its execution. The second condition is that there are no 
accordion clocks 601, left which were generated as a consequence of an event on 
10 thread Tj through mies (3), and (4). , 

Order in Java ^Programs 

To avoid data races, a programmer can force fragments of code running on 
different threads to execute jn ^a certain^ order by adding extra synchronisation 
15 between these threads. Java contains several constructs^ that enforce 
synchronisation: , 

- the sequential execution of .code, , . . . . 

- start and j oin which operate on objects pf type Thread, 

- locked objects, , 
20 - synchronised member functions, 

-and wait and notify (All) . 

There are a few other operations on objects of type Thread, that influence 
the execution of other threads but which are not taken into consideration since they 
are either being removed from the Java APIs or cannot be used to synchronise two 

25 threads: destroy, interrupt, resume and stop. 

The most basic form of order in Java is the sequential execution order of 
events in one thread, Ti (see Fig. 7). Events ei.i, ei.2, ei.3, and ei.4 are all events of 
thread Ti. They are separated by some synchronisation operation, 1001, 1002, 
1003 from each other. This synchronisation operation has as a result that other 

30 events will happen in a specific order indicated by the arrows 1004, 1005, 1006. 
Since the events eij all were performed by the same thread, they will always be 
executed in the same sequential order. 



i 
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Another synchronisation can be seen in Fig, 8. The start member function 
701 of objects of type Thread is called by thread Ti to start the execution of a 
second thread/ T2. When start is invoked on the . Thread object of thread T2. a 
new thread is created that starts executing the run method 702 of the Thread 
5 object T2 it was created from. This operation creates an order of events. All events 
in thread T2, 02.1. are automatically ordered after the events of thread Ti that 
preceded the start method call, ei,i. This is indicated by the an-ow 703 and is 
reflected by the values of the vector clocks. 

Similarly, the join member function 705 of Thread objects allows one 
10 thread. Ti, to wait for the end of the iexecutibn 704 of a second thread, T2. Again, 
this imposes an order on the events. -All events, e2,i. from thread T2 are ordered 
before the events, ei.3. of thread Ti that follow the j oin. This is indicated by the 
arrow 706 and is reflected by the values of the vector clocks.. ^ r: • • . 

A lock 306 is associated with every Object in Java as can be seen in 
15 Fig. 3. A thread, Ti, can try to take this lock- using the .bytecode monitorenter 
801 as can be seen in Fig. 9.^ If it has obtained the lock 306, it can release it through 
the bytecode monitorexit 802. When the lock 306 is already held by thread Ti 
and thread T2 tries to :Obtain the loGk. through bytecode monitorenter 803, the 
thread T2 will be put on a waiting list until the lock 306 is released: Then T2 will be 
, 20 . rescheduled for execution. • ' ^ ' / 

T2 involved, it just indicates that there is a critical section between the bytecodes 
monitorenter and monitorexit (pairs 801, 802 and 803, 804). It does 
suggest that the programmer is aware of a potential race and is using this construct 

25 as synchronisation. This is therefore considered a 'de facto' synchronisation, 
depicted in Fig. 9 by a dashed arrow 805. All events before ei.3 also come before 
e2.2. This is reflected in the values of the vector clocks of the events. 
..... ; The- synchroni zed-keyword • iS'-applied^ t a subset of the. member 

functions of a class, the 'monitor*. When a thread invokes one of these member 

30 functions on an object of the synchronised class, Java™ ensures that none of the 
other member functions in the monitor is being executed. This is implemented 
through the. object locking mechanism mentioned above. When a synchronised 
member function is executed, the lock of the object containing the member function 
is taken. When the member function finishes, the lock is released. / 
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A final set of synchronisation primitives, as represented in Fig. 10. is wait 
and notify (All) which are member functions of every Object. When a 
thread, Ti. invokes .wait 1102 on an object, the execution of the thread, T-i, is 
halted until another thread, T2, executes notify (All) 1106 on that very same 
object. At that time the first thread, Ti, can continue its execution. This imposes the 
order seen in Fig. 10 depicted by the dotted arrow 1 109. However, a thread is only 
allowed to invoke wait or notify on an object if that thread is owner of the lock 
of that object. The wait/notify construct is used to temporarily leave a monitor. So in 
reality it suffices to observe the orderings 1108 and 1110 between the 
monitorenter 1101, 1103, -1105 and monitorexit 1102, 1104, 1107 
depicted by the full arrows 1108, 1110.* 

Data Structures'for Determining Logical Order 

So far, it has been indicated which program constructs in Java™ are 
considered as introducing an order. Now, it is shown how this ordering can be 
generated during the execution of multi-threaded Java™ programs. 

Every thread T\ consists of a sequence of events eij separated by 
synchronisation operations. When a thread Ti is started, through a call to the 
member function start of an object, o. of type Thread (or one of its derived 
types), then the instrumentation 404 of this object o is expanded by adding a thread 
information structure^. through the < thread information structure - address 406r*^as 
represented in Fig. 4. 

This thread information structure 410 is illustrated in Fig. 11. The thread 
information structure 1301 comprises two fields: an accordion clock 1302 and a 
thread identification number 1303. - 

The accordion clock 1302 is used to indicate the current accordion clock for 
the currently executing event, ejj, on the thread Ti. It is initialised as described by 
formula (3). Every time one s of the synchronisation operations described 
hereinabove occur, the accordion clock 1302 is updated according to rule (4). This 
update is now described, in more detail for every synchronisation operation. 

A thread start (Fig. 8) involves two threads, for example Ti and T2. The 
thread Ti was Initially executing event ei.i and after the start operation 701 the 
thread is executing a new event ei,2. The accordion clock 1302 of the newly 
executing event, ei.2, is calculated by using rule (4) with E = {ei.i}. Through the 
start method call 701, a second thread, T2, is created. This second thread T2 is 
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initially given a accordion clock 1302 according to rule (3). However, since event e2.i 
is ordered after event ei.i,. this accordion clock 1302 must be updated immediately 
. according to mie (4) with E = {ei,i} : . . .. 

A thread join (Fig. 8) involves two threads for example. Ti and T2. Thread 
5 T2 tenminates after its retum statement 704 so no new. event is started and the 
accordion clock 1302 of thread T2 does not need to be updated. Thread Ti on the 
other hand was executing event ei^ and through the join 705 a new event, ei.3, is 
started. The accordion clock 1302 is updated according to rule (4) with E = {ei.2. 
62.1}. The accordion clock for event e2 1 is obtained through the accordion clock field 
10 1302 of the thread T2. - 

To conrectly handle the case of synchronisation through object 401 locking 
and method synchronisation, a data structure called the lock Jnformatjon structure 
409 is used. The lock infonnation structure 409 is described schematically in Fig. 12. 
The lock infonnation structure 1401 contains but one field: an accordion clock 1402. 
15 The accordion clock 1402 contains the accordion clock of the last event that was 
executed by the last thread that perfomned a monl t orexi t on the object to which 
" the lock infonnation. structure -409 is associated through , the lock infonnation 
- . structure addresis 405.. • r ' ■ .? - - . . 

Initially, when an object 401 is created, its lock information structure address 
20* 405 is not assigned a lock information structure 409. When an object's lock 306 is 
.v.^;^. > takenv^for example as in Fig; 9/therevanei,tv/Oisepa 
object is being locked or it isn't. ; 

When an object 401 is locked, for the first time 801 by a thread T,, be it 
through a monitorenter operation or a call to a synchronised member function 
25 of this object, a lock information structure 409, 1401 is assigned to the lock 
information structure address 405,^ Its accordion .clock 1402 is initialised to the 
accordion clock value 1302 of the thread Tr performing the lock. The locking of the 
. . ... .object iis^aiisynchronisation operation ithat ends event ei;i and starts. ei..2-on4hread- 

Ti . The accordion clock 1302 of the thread Ti is updated according to rule (4) with 
30 E = {ei,i} - ' : . 

Eventually, the lock on this 401 object will be released 802 by thread Ti. The 
accordion clock, 1402 is assigned the current value of the^ accordion clock 1302 of 
the thread T^ This synchronisation operation 802 ends the event ei,2 and starts 
event 61,3. The accordion clock 1302 of the thread Ti must therefore be updated 
35 according to rule (4) with £ = {ei.2}. 
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When an object 401 is subsequently locked again 803 by for example thread 
T2, the accordion clock 1302 of the thread T2 is updated according to rule (4) with 
E = {ei,2, ©2.1}. The value of the accordion clock of 62.1 is the current accordion clock 
1302 of thread T2 and the value of the accordion clock of ei.2 can be found in the 
5 field 1402 of the lock info structure 409. 

The thread identification number field 1303 is assigned a sequence number. 
For the first thread, it is assigned 0. the next thread is assigned 1 , and so on. 

Classifying Local and Global Objects 

10 In order to detect inconsistent concurrency state transitions such as data 

races, it must be verified that none- of the read and write operations to the same 
' variable df an object happen in a non-deterministic order. One approach to doing 
' this is to observe every bytecode in the Java^ program that reads or modifies data 
on the heap/This is very time consuming. 
15 According to an embodiment of the present invention, sets 1504 of local 

objects and a set 1508 of global objects on the heap 1510 are constructed, as can 
be seen in Fig. 13; The local sets 1504 contain objects that can only be reached by 
one thread. The global set 1508 contains objects that may be reached by more than 
' one thread. Only read and write operations to objects in the global set need to^be 
• 20 observed extensively to determine the occurrence of data races in accordance with 

.... ^>;.o rr '^'f^-an*embodiment-oftheipresenMnventioni^ ^ --^ ? - ^'94^ ^ i.?;^^. .....^.3,^ . 

At program start-up. the global set 1508 is empty. An object is made member 
of the^ global set by storing the value -1 In the TID field 407 of the object 
I instrumentation. There are three ways an object can become a member of the 
25 global set: . . * : 

The first way occurs when a new class is initialised, a Class object is 
created and stored on the heap 1510. This object is Immediately stored in the global 
^ set 1508 and remains there until it is destroyed. Class object represents a class 
^ when it is loaded by the Java^ interpreter A Class is reachable by every thread 
30 since every thread is abie to create an object of this type. This means that a Class 
object is always immediately made member of the global set 1510 (all Class 
objects are stored in the class set 1509): Inside a class, there are static variables 
• that. can be read and written to. These are, by definition of the Java™ language, 
: Mrtrimediately global to all threads.* 
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The second way occurs when a reference,? r, to a local object is 'manipulated 
by the bytecodes aastore. putf ield or putstatic. Initially an object is 
created locally.- The only references to it exist, on. its creating thread's stack. One 
way to change the status of an object from local tc global is by storing its reference 
5 into a second object. If this second object is reachable by another thread, so does 
the object become reachable by this other thread. At this point, the object could 
potentially be involved in a race. If, on the other hand, the second object is solely 
reachable by the thread itself and not by another thread, the object remains local. 
There is only a small number of bytecodes that can manipulate the reference of an 
10 object. The bytecode aastore stores a reference, r, into a field of an array. The 
bytecdde putf ield stores a reference, r, or a value, v, into a non-static member 
variable of an object, o. The bytecode putstatic stores a reference, r; or a value, 
V, into a static member-variable of an object, o. ; : , 

If a putf ield or putstatic bytecode is used, it is verified whether they 
15 are storing a reference r into the member variables of object o. If a reference r is not 
dealt with but with another value . v; the global set 1508. and the local sets 1504 are 
• * not modified.' If a reference r is dealt with then it is checked whether o is global i.e. 
rits T/D field 407 has the value -1.« If .0 is global then the object to which reference r is 
• ; . pointing, s, is also made global by storing the value -1 in its T/D. field :407. Next, all 
. 20 the descendants of the object s are determined. The descendants are all objects 

jithat^^aiieiiJieaGhable vfrQm>pbje.Gt^s.,througb.,the .references,r304.xontairied s... The. . ...^ 
descendants are also made global by storing the value^.-1 in their T/D field 407 
(when an object becomes global; all the objects reachable from, this new global 
, /object also. become global). . 
25 • The third way is when an object, o, is of type Thread and this object is used 

to start a thread. In Java"™, threads are started by creating an object containing a 
■ ' run method. When this object's, start method is called, a new thread is created 
and :starts. executing theLCOde. in-the..run/m,ethod..,At-thread:starti-.the object o is 
reachable by both the new' thread and the thread, creating , the new thread. 
• .30. Therefore, the Thread object o and all.its descendants are made global by storing 
r the value -1 in their r/D field 407.: • - . , 

V To improve the accuracy of .the classification into a.global set 1504 and local 
sets 1508, a 'refiner' can be invoked. The refiner's job is to make a, more, accurate 
estimate of the sets 1504 of local objects and set 1508 of global objects on the heap 
35 1510 by removing objects firom the global set 1508 that are only reachable by one 
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thread. The algorithm described is based on a "marl< and sweep algorithm" or a 
"mark and scan algorithm". 

Before the refiner is invoked, all the other threads T, in the interpreter are 
stopped. For every thread Tj, a set S/ of all the objects that are reachable from that 
thread is created. Every set S/ can be represented by an array of bits, 8,-. If a 
reference, r. is member of the set S,, the bit Bjj at the index, y, corresponding to the 
reference r is set to 1. Else it is set to 0. Initially every set is empty, S/ = 0 so 
' W. Pj.Bij = 0. 

For every thread, Tj, all references present on its stack 1501, St, are entered 
in the set S,. Every class object 1509 is also entered into the set S,. Then all children 
of objects pointed.to by references present in S/ are entered into S/. This is repeated 
iuntil no more references can be added/: 

These sets S/ are combined into a set of objects reachable from multiple 
threads as follows: v : ; / 

Stot is used to refine the general mechanism according to the present invention after 
garbage collection. If an object is hot present in Stot, it is only reachable from one 
thread and therefore local. Thus, if a reference, r, occurs in only one set, S/t, and r 
currently points to a global object, o, then this object o is made local by storing the 
value A: into its TID field 407. The large data structures that are necessary to enable 
.;..data race^detectlon .are then removed and the object is marked.:as: being reachable^ 
only by this one thread. . 

Once the refiner has finished its analysis, the. stopped threads can be 
resumed. The refiner can be called at the same time as a garbage collector or at 
any other moment in. time when the programmer estimates that a large number of 
global objects might become local. • 

According to a preferred embodiment of the present invention, each time the 
garbage collector.. performs its job. it -is followed by the refiner according to..the . 
present invention. A garbage collector must somehow determine, whatever its 
underlying algorithm, whether an object is no longer reachable by any thread of the 
program. If this is the case, the object can be removed from; the heap. Due to the 
similarity between the garbage collector and the refiner, the refiner is thus 
preferably, but not necessarily, implemented after the garbage collector. 
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. Detecting Data Races : : ^r- 

Once an object becomes a member of the global set 1508, it's instaimented 
further to allow full data race detection. The instrumentation can be seen in Fig. 14. 
An object infomnation structure 411. is built and its address is stored in the object 
5 infomnation address 408. The object structure contains.2 fields: 

- the NrMembers field 1602 that contains the number of member variables that are 
present In this object and which accesses must be observed to detect data- races, 

- the MemberlnfArrAddr field 1603 that contains the address of the 'member 
infonmation an^y' 1604. 

10 The member information array 1604 is an an-ay of length NrMembers 1602 of 

. addresses of 'member information structures' 1606. These member information 
structures 1606 are used to maintain an access history recording relevant read and 
write operations to the corresponding. member variable. . * }. . 
The member information structures 1606 consist of: i v r 
15 - a field description 1607. This describes the . member variable that is being 
observed. Information that might be contained is, e.g., the type, the name, ...of the 
member variable. ; 

- a lock 1608. This lock can be used to obtain exclusive access to the member 
information stmcture 1606. v . ;> v . v . 

20 - a Yead list' address 1609. The address of a doubly linked list of /read information 
-stmctures'?.f1B1i,1 .->The:y^read:^4 information.* struGtu^^ 
information on relevant read operations. Initially, this list is empty, . 

- a 'write list' address 1610. The address of a 'write information structure' 1618. The 
write information structure r 161 8 records information about the last write operation 

25 performed on this membervariable: Initially, this list, is empty.- - . . 
. . The read information structures 161 1 consist of six fields: . 

* - an accordion clock 1612. which is a copy of the. accordion clock 1302 of the thread 
that perfomied the read operation. ,i . . ...... 

- a program counter 1613, which -is the, program counter . at . the time the read 
30 operation occurred. ' • 

- a method 1 614, the method that performed the read operation. 

- a thread identification number 1615; identifying the thread that performed the read 
operation. - . ^ : r= = . - i .' ^ : . . 

- a next field 1616, used to link the read information structure in a doubly linked list 
35 . 1623: ; ' 
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- a previous field 1617, used to link the read information structure in a doubly linked 
list 1623. 

The read information structures 1611 describe the most recent read operation 
performed for each separate thread on the corresponding member variable. 
5 The write infomriation structure 1 61 8 consists of four fields: 

- an accordion dock 1619, which is a copy of the accordion clock 1302 of the thread 
that performed the write operation. 

- a program counter 1620, which is the program counter at the time the write 
operation occurred. 

10 - a method 1621, the method that jDerformed the write operation. 

- a thread identification number 1622, Identifying the thread that performed the write 
'.. opieration. \^ ■ ' ' 

The write information structure 1618 describes the most recent write operation 
performed on the corresponding member variable. 
15 Using the data structures described so far, data race detection is performed 

as follows. All bytecodes that read or write data from the heap need to be observed. 
These consist of: . ^ • J; • 

- Read operations: : - • " ^ . ^ 

- {abcdf ils}aload which read data from a field of an array. 

20 - getfield which reads from a member variable of an object. ^ 

^' -•'getst" atiiG 'Which^ readS'from-a* member- variable of an objectfj^v-ni ^^v-".^,^&^^.« . 

- Write operations: '* ^ 

- {abcdf ils}as tore which write data to a field of an array, . . 
-put field which writes to a member variable of an object, 

25 - put stat ic which writes to a static member variable of an object: 

In addition to the actions* that need to be taken to update the global set 1508 
^ ' and the sets of local objects 1504, the following is performed when a thread T| 
executes one of the above bytecodes. ^ w . ,-, . . * 

• When an opcode* is executed that reads a field, a member variable or a static 
30 variable of an object, o. then the TID 407 of o is read. If o is found to be local 
(Or/D ^ -1 ), the interpreter can continue and no race is detected. . 

If o is found to be a global object, full race detection must be performed. The 
index, y, into the member information array 1604 corresponding to the member 
variable read is determined. A new read infonriation structure 1611, Readnew* is built 
35 with the accordion clock 1302 of the executing thread, with the current program 
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counter 1613, the currently executing method 1614 and the currently executing 
thread's thread identification 1615. 

If there is already a previous read infomnatibn structure 161 1 present with the 
same thread identification number 1615, then this old read information structure 
1611 is removed from the read list. The new read information structure 1611 is 
inserted into the read list. 

If there is already a write operation stored in the write infonriation structure 
1618, Writeoid, a race between Readnew and Writeoid is reported if and only if: 

(Readnew.TID ^ Writeow-TID) a (Readnew.accordion J Writeoid.accordion) 

When an opcode is executed that writes a field, a member variable or a static 
variable of an object,, o, then the TID 407 of o is read. If o is found to .be local 
{Otid ^ -1 ). the interpreter can continue and ho race is detected. » 

\f 0 is found to be a global object, full race detection must be performed. The 
index, y, into the member information ran-ay 1604 con-esponding to the member 
variable written is detennined. A new write information structure 1618, Writenew, is 
built with the accordion clocl< 1302 of the executing thread, with the current program 
counter 1613, the currently executing method 1614 and-the currently executing 
thread's threiad identification 1615. m ; - i 

If there is already a previous write: operation, Writeoid, stored- in- the write 
information structure 1618, a race between Writenew and Writeoid is .reported if and 

*■Only^if■^•^^''-**'/♦'^■•■•-•'•--•'^••■'---,» — — -^.--..v, *^>--.^y:,:l...^^.;..*.,ti v.^;..*^»»*w^,y,^;A>.i>..,«^A^.... . v-.r,c..--> . 

{Writenew.TID-56 Writeoid.TID) A (Writenew.accordion IWriteoid.accordion) 
Furthermore, all previous read operations stored in the read information 
structures 1611,* Readi.oid, are analysed, with respect to the new. write operation. A 
race between one of the read operations stored in Readj.ow and the new write 
operation stored in WritOnew is reported if and only if:. • - . .• 

• (Writena«.TID9tRead,,oidiTiD) A (Writenew.accordion ||Readi,oid-accordion) 
.Finally; the old write. information, structure 1618, Writeoid, is replaced by: the new., 
write infomiation stmcture, Writeriew- * * > ■ . - - . 

■ This analysis is canried out until the Java™ program terminates. 
While the invention has been shown and described with reference to 
preferred embodiments, it will be understood by those skilled in the art that changes 
or modifications in detail may be made without departing from the scope and spirit of 
this invention, . . . : . : - 
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* Implementation r - 

There are different possibilities for implementing the method of the present 
invention. These comprise different steps for implementing the method of the 
present invention; - 
5' ' • • : . . . - 

Java™ implementation 

• The method of the present invention has been described in general 
hereinabove. Applied for an implementation in Java™, steps as described 
hereinafter are to be taken. 
10 In a first implementation, the method may be implemented in an interpreter. 

. As shown in Fig. 17b, an interpreter 1707 is a program which : executes other 
programs. It accepts as input source , code 1706, a program text in a certain 
language, and executes it directly oh a machine 1708. The interpreter 1707 
analyses ^aeh statement in the program each, time it is executed and then performs 
15 the desired action.. 

Different steps are to be taken for implementing the method of the present 
invention in an interpreter. . ^ ;.i ' . - ' 

A first step is to instrument all the synchronisation primitives of Java™ using 
vector clocks or accordion clocks, which are an advanced version of vector clocks 
r. 20 that can dynamically grow and shrink as threads are created and destroyed. r 
.. . .. A. next step is to instrument, every object with a minimai data,:s|rui?twre 
allows the method of the present invention to be used. When objects are created 
- • using new, newarray, anewarray or multianewarray; they are extended 
with an instrumentation data structure cbnsisting of at least 8 bytes extra, e.g. 20 
25 bytes extra. The structure consists of two parts. The first is the thread identification 
number (TID), In this field, the TID of the thread that created this object is stored or, 
when the object becomes global and is reachable by several threads, -1 is stored. 
The second part .consists of link fields ..that will be .used to link a much larger data 
structure for full data race detection only when the object becomes global. . 
30 An object can contain several fields that can be written or read. If a new 

global object is instrumented, each field must have its specific data structure that 
maintains informatfon about the accesses to that field. This data structure contains: 
a description of the field being accessed containing its name, type information, the 
location in the code where the last read and write occurred. This consists of a class, 
35 a member function, a thread identification and a Java Virtual Machine (JVM™) 
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program counter. And finally, a vector clock indicating wh n the last reads and write 
. occurred. . : - . . : 

- , Using this, data structure, the instructiqn.s aastore. putfield and 
putstatic are instrumented. - \ 

5 If it is supposed that the bytecode. aastore stores a reference, R into an 

array, referred to by reference A, then there are two possibilities: . ' 

- If the object pointed to by R is already global (R.TID == -1) then nothing, happens, 
. the object is already being watched for possible data races 

- If on the other hand, the object is not yet global, the TID of the array referred to by 
10 A is checked. If it is global (A.TID == -1), then by storing R into A, the object referred 

to by R also becomes global. Otherwise, if A.TID != R.TID, the reference is being 
stored into an array that is reachable by another thread. The object referred to by R 
must again be made global. r., ; • ^ - 

- If the object refen-ed to by R . becomes global, all its children ,are recursively 
15 checked.. Each child that is not yet global is made global. Attention must be paid to 

stack overflow when recursively marking a deep data structure as global. 
... , . A similar procedure is followed for put field and for putstatic. 

Finally, the actual race detection is carried out For this, 20 bytecodes, for 
instance, are instrumented which read or write to an object. Each time such a 
20 bytecode is executed, it is checked whether it is .a global object. If not,- nothing^has 
.... ,..to.be^don.e;..races,are im^^^ behg deajt wtth,^ t^^^^ data 

structures can be accessed and it can be verified, using the vector clocks or 
•accordion: clocks, whether this new instruction represents a data race. If so, this is 
flagged to the user. The data structures, containing the history of read and write 
25 operations on the objects are then updated with the new location of this instruction 
and the new vector clock indicating when the instruction occunred. 

In a second implementation, the method may be implemented in a compiler. 
.In its. most general form, as shown in Fig. 17a. a connf)iler 1704 is a program that 
accepts as input a source code 1701, a program, text in a certain language, and 
30 produces as output an executable code 1.702, a program, text in machine language, 
also called object code, while preserving the meaning lOf that text. Almost all 
compilers translate from one input language, the source language, to one output 
' language, the target language, only.The source, and target language are normally 
expected to differ greatly: the source language could be C and the target language 
35 is machine-specific binary executable code, to be executed by a machine 1703, 
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such as a Pentium proGeS'sor for example; There exist also compilers that compile 
from one language to another, for example from Java to C, or from bytecode to C. 

In this second implementation, code according to the method of the present 
invention is added to the generated code to be executed for every instruction where 
5 a check is required. This means that, instead of a dynamic call to instrumentation 
routines, a static call to instrumentation routines is added to the generated code. For 
example: 

- replace the allocating instructions (new, newarray. anewarray, multinewarray) to 
instrument objects, so that extra memory is allocated for data structures used while 

10 instrumenting the program, : 

- replace the implementation of the synchronisation operations (in Thread and 
- Object) so that vector clocks or accordion clocks are updated to build causual 

relationships between events, 

- replace the implementation of nionitorenter arid mbnitorexit so that each 
15 time an object is locked through thgse bytecode instructions the accordion cidcks 

are updated, * • ■ - — "'^ *. • " 

- and replace the read and write instructions -to observe whether objects become 
global and whether read and write operations are involved in a:race. 

: In a third implementation, the method is implemented in hardware. There 

20. exist processors that . execute Java bytecode directiy (picoJava ' from Sun 
^^^v I M example)... If an;, adaptation of such a processor is- made,,Jhe 

^ ' following is done: / ^ 

- If an object is created, instrumentation is added. Therefore, on a new, newarray, 
anewarray or multinewarray instruction, an interrupt is generated. The instructions 

25 are intercepted in a trap routine which performs the memory allocation so that 
memory is allocated for the created object and the extra instrumentation. For 
example, in this extra memory; two fields could be- stored. One field, the thread 
identification field (TID), would Indicate whether an object is global by containing the , 
value -1 or would contain the value of the identification of the only thread that can 

30 reach the object. Initially, at the object's construction, this field would contain the 
identification of the thread that created the object. The second field, the link field 
(LINK), would serve as a link to a larger data structure that is only allocated when 
this object becomes global; Initially, at the object 's construction, this field would be 
^ - ertipty. . - . . - 
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- If bytecodes that can change references to^objects are implemented, a trap is 
generated so that a. jump to other bytecode is possible. This bytecode is then 
responsible for analysing the changes to the references. If by such a change 

. objects might become reachable to more then one thread, these objects are 
5 ' marked, using the extra allocated space, as being global. Also, the link field is used 
to connect a larger data structure used for full , data race detection. This data 
structure would maintain a list of read operations and the last write operation 
together with accordion clocks indicating 'when' these operations occurred. 

- If the synchronisation operations in the Thread class (starting, joining. ...) are 
10 implemented in pure Java code, some code Is added to calculate the 

vector/accordion clocks. The synchronisation operations Jn. the, Object class is 
similarly instrumented. If the monitorenter and monitorexit is implemented 
in hardware, this instruction. is trapped to an instrumentation trap-handler where the 
accordion clocks are updated.. . 

15 - The instructions that perform read/write operations are trapped, and intercepted in 
a trap routine. First, the routine would check whether it is reading or writing to a local 
object. If so, no further race detectipn.is necessary since no race is possible. If on 
. the other, hand it, is detected that read or a. write occurred to a global object 
- (TID==1) then the read or write operation is analysed using .the history pf read and 

20 write operations, added during creation of the object and further expanded when the 
object became global, Jn order to detect a data race.. : If. a race is detected, a trap 
. could be takep that is dedicated to notifying the user of the fact that a. race has 
occunred. The. user. can. then yvrite^ his. own interrupt, handler to respond 
appropriately to the occurrence . . . 

25 fii fourth implementation is. the following: in a virtual machine from Sun, there 

is a profiler interface called the "Java virtual Machine Profiler Interface" (JVMPI). It is 
used to attach a profiler (a shared library) to a virtual machine. This profiler can 
request to be notified of all sorts of eyents that might interest it. For example the 
entering of monitors, loading of classes,: calling of memberfunctions, etc. Using 
,30 JVMPI it is possible to. request to b^e handed the classfile of a Class that will be 
loaded. At that point, the profiler can mofjify the code and hand it back to the virtual 
machine which, will effectively load it and start executing its. code. Race detection 
then goes as follows: 

- When the class Olp ject is. loaded, it is adapted so that extra memberfields are 
35 added for the. instrumentation, for example a thread identification field (TID) and a 
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reference to a more eiibbrate data structure used when doing full data race 
detection. Furthermore, the code for the memberfunctioris that are used for 
synchronisation (like wait) are instrumented so that it updates vector clocks. 

- When the class Thread is loaded, it is adapted so that extra memberfields ar 
5 added to be able to calculate the vector/accordion clocks. The memberfunctions 

used for synchronisation, like start, join, ... are modified so that they update 
the vector clocks. 

- When other classes are loaded, the classfile is modified so that for each 
raonitorenter and monitorexit bytecode, the vector/accordion clocks of the 

10 threads are updated. Furthermore, all the synchronised memberfunctions are looked 
• for, and they are modified so that when they are called, the vector/accordion clocks 
• of the threads are updated. • ^ 

'1h addition, all read/write operations in all class files are replaced so that, using 
the vector/accordion clocks and the extra data structures in Object* the accesses 
15 to objects are analysed for potential races. - 

Implementation for another language/platform 

According to further embodiments of the present invention, programs written 
in languages which are hot strictly object oriented nfiay be analysed for inconsistent 
20 * dynamic concurrency state transitions in the execution of multi-threaded programs, 
J. e.g. data, races. Preferably the language shields the contents of an' object from the 
outside worid to a ceirtain extent^ i.e. the contents of an object are reachable by a 
' limited set of references/handles/entry points, whatever they are called. In particular, 
the programs may use pointers or references. Using theise it must be possible to 
25 determine which objects are reachable from another object or from a certain thread. 
Of course, the environment must support threads. 

* As the above embodiments have been described in detail^ only the most 
■ important differences are described in the following. 
' The miethod then goes like this: 
30 - The allocation routines of objects are located. These routines are adapted so as 
to ^dd extra data structures to help with race detection, i.e. a thread identification 
(TID) to indicate whether this is a local or global object, and an extra data structure 
for the full data race detection. 
' The data structures that describe the local state of a thread are located. The 
35 ^ ^data structures for calculating the vector clocks are added to those. 
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- Next, all constructs that can be used to create^an order between the execution of 
pieces of code are located. At these points, the code must be instrumented, be it 
inline or by a call to a routine, so that the vector clocks of the threads are updated to 
reflect the order that is created by the synchronisation operations. 

- Next, all the read/write operations are located. Each must be replaced, be it 
inline or by a jump to a subroutine, by code that updates the global and local sets 
and which uses the data structures in each object to detect data races if these 
objects are in the global set. 



. dynamic allocation of data structures is possible. This means that all data stnjctures 
present in a program are already visible in its executable, .i.e: they are statically 
allocated. These data structures must be located and expanded statically to add 
data structures for race . detection. This is preferably ^ done . by, • the compiler. 
Furthermore, the extra data structures added to objects are probably also allocated 
statically. This is possible since they would have a fixed size since there would be a 
fixed number of threads (else dynamic data structures , are needed). Therefore no 
gain in memory consumption is obtained by this technique, but a gain in execution 
speed since these data structures in; the objects need, not r be updated if a local 
object is involved. - , ^ , , f 

' ' In this case of the language not using a heap; all objects are pre-allocated. 
So .initially, these pbjects are; to be ma iked . gioba! • since . it J? . npt .Knqwri which 
threads have access to them. It is therefore preferred to run as soon as possible the |j 
analysis routines that can refine the. global set so as to mark most of these objects 
. as local to a certain thread; . , 5 . 

Anemative Embodiment of a Refiner , , = r 

; In one. embodiment the classification into local and global objects goes as 
follows. This, embodiment is called, a "two spaces copying algorithm with tag 
provision" and is based on, and can be used vyith garbage-collectors of the two 
space copying type. Each object is instrumented with a data structure which 
identifies whether it belongs to the global or to a local set. The data structures used 
are. the same as the ones used for implementing the "mark:and sweep algorithm", 
except for the one bit which shows . whether an ol)ject is marked or not. This bit is not 
. needed in the ?two spaces copying algorithm", as there,- instead of maricing . an object 
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which belongs to a thread.^it is copied to a second memory (or a new region of 
memory). . . . 

- A tag comprising a thread identification field (TID) is added to every object when 
it is instantiated. 

- All objects in the root set of a particular thread are copied to a new region of 
memory and the TID of this thread is entered in the mark field of the copy. Pointers 
to the object need to be updated since it is stored elsewhere now. 

- Then all descendants (children, children's children, etc.) are copied to the new 
region, still marking them with the TID, 

- The same procedure is then started for the next thread. If an object is found that 
has already been copied into the new region, then its TID Is examined. If the object's 
TID is different from the TID of the current thread, theii it was clearly copied in by 
the^ analysis of a>previous thread and therefore it is reachable by more than one 
thread. It Is thus a global object, so its TID is marked -1, as well as the TIDs of all its 
descendants. . v v 

- Once analysing all the threads is finished, a copy consisting of only live objects 
is obtained, and each object has'been marked with a TID that is either a number of 
a thread, indicating that it is local, or TID:= -1 , indicating that the object is global. 

According to still a further embodiment, which is a modification of the 
previous embodiment, when a global object |s encountered, it Is copied to a third 
region instead of marking, its Tip -1 , This is called a ;,-*three spaces ^copying 
algorithm";. 

Various combinations of garbage collectors and refiners are included within 
the scope of the present invention, each combination representing a separate 
embodiment of the present invention. For example, a copying garbage collector, e.g. 
two-space copying, and a mark and sweep global/local analysis; a mark and sweep 
garbage collector and a copying global/local analysis; a copying garbage collector 
and a copylng globaj/locaLanalysis; or a sweep garbage collector and a 

mark and sweep global/local anaiyisis may be combined. 



Improved garbage cdllector 

' The method of the present invention can' also be combined with or included 
as an integral part of a known garbage collector in order to obtain an improved 
garbage collector. Garbage collection is then split up in two portions: a local garbage 
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collection carried out only on a local set, and a full garbage collection oh all local 
and global sets, It is used as follows: 

- The local sets and a global set as explained above, are constructed and 
maintained. Therefore, each object is instrumented as explained above, with a.o. a 

5 thread identification tag containing the thread identification of the thread to which the 
object belongs if it is a member of a local set, or a value that can never be assigned 
to a running thread, e.g. -1 , if the object is a member of the global set. 

- If an executing thread needs to allocate extra memory, but the memory is 
exhausted (or some threshold is passed) then a local garbage collector is first 

10 started. This local garbage collector marks all objects that are reachable from this 
one thread. Then it looks among all objects local to this one thread. If among these 
local objects there are objects that were not marked by the garbage collector, then 
these objects can be freed since these are objects that were only referenced by this 
one thread and at this point iare no longer referenced at alL These objects cannot be 

15 referenced by any other thread arid can therefore be removed without the help of 
another thread if they are not referenced anymore. 

- If the thread is unable to reclaim enough memory on its own, then, a full garbage 
collection is started. 

' The advantage of this approach is that, usually, a thread is able to clean up a 
20 large amount of data and allocate new memory without the intervention of other 
threads. A full. garbage collection is then not required. This is a good thing.^ because 
a full garbage collection is a very disruptive process. To dp a fast, full garbage 
collection, usually all the threads are stopped since it is very hard to clean up data 
that is still being manipulated by other threads. 
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Claims t : ^ 

1 A computer implemented method for classifying objects into a set of global 
objects, containing objects that can be reached by more than one thread, and a 
set of local objects, containing objects that can only be reached by one thread, 
5 when executing multi-threaded programs, whereby the classifying is done 

dynamically by. observing modifications to references to objects by operations 
performed in the computer system. 

2.- The method of claim 1, wherein each object is provided with an instrumentation 
data structure to enable observation of modifications to references to objects. 

10 3.- The method of claim 2, wherein the instrumentation data structure comprises at 
. , , , .ieast a thread identification tag for identifying whether an object can be reached 
. . by pnly one thread or by more than one thread. 

4. - The method of any previous claim fiuther comprising the step of: 

recording in a memory cbncxuteiicy state traiisitioninformation of global 
15 objects. 

5. - The method according to any of claims 1 to 4, further comprising a step of 

selectably perfonning garbage collection only on the set of local objects. 

.6.- A computet implemented method for detecting inconsistent dyiiamic _\ . . 

_ ^ ^ ^ concurrency state transitions in the execution of multi-threaded programs 

20 amenable to object reachability analysis, the method comprising the steps of: 

executing multiple threads on a computer; 

at least periodically during execution of the threads classifying instantiated 
objects into a set of global objects (503; 1508), containing objects that can be 
reached by more than one thread, and sets of local objects (504; 1505, 1506, 
25 1 507), containing objects that can only be reached by one thread, and 

recording in a mempry concurrency state transition information of global 
objects. 

f 

7.- The method according to claim 6, further comprising the step of determining 
occurrence of data races between two or more threads. 
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8. - The method according to claim 6 or 7; wherein the set of global objects (503; 

• 1 508) is periodically analysed so as to remove objects from the global set which 
are only reachable by one thread. 

9. - Method according to claim 8, wherein the removal of objects of the global set 
5 ; which are only reachable by one thread takes place after a garbage collection 

step. 

* 10.- Method according to any of claims 6 to 9, wherein each object created during 
execution of the program is provided with an instrumentation data structure 
(404) to enable race detection. 

10 11.- Method according to claim 10, wherein the instrumentation data structure (404) 
comprises 

- a first address field (405) for containing the address of a first data structure 
(409; 1401) when the object is locked for the first time, 

- a second address field (406) for containing the address of a second data 
15 structure (410; 1301) if d^e object is of type thread or a subtype thereof, 

- a thread identification field (407) for recording whether the object belongs to 
the global set or to a local set, " • . . 

. . - a third address field (408) for containing the address of a third datai structure 
. nr:^r>i0Xly l>6.ftl)jwhenihe object belongs.to.the:globd»seti,^r,,M.,.-«i. ... 

20 12.- Method according to claim 1 1, whereiin the first data structure (409; 1401) ' 
. comprises a vector clock field for taking into account the fact that threads are 
created and destroyed dynamically, 

13. - Method according to claim 1 1 or 12, wherein the second data structure (410; 

1301) comprises a vector clock field (1302) taking into account the fact that 
25- - threads aire created and destroyed dynamically, and indicating the current vector 
clock for the currently executing evient on the thread, and a thread identification 
number (1303). 

14. - Method according to any of claims 1 1 to 13, wherein the third data structure 

(4 1 1 ; 1 60 1 ) comprises a coiinter field ( 1 602) for containing the tiuiiiber of 
30 member variables present in the object, and a fourth address field (1603) for 

containing the address of a fourth data structure (1604). 
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15. - Method according toiclaim 14, wherein the fourth data stracture (1604) is an 

array of a length given in.the counter field that contains addresses of fifth data 
structures (1606). . . ^ 

16. - Method according to claim 15, wherein the fifth data strucUires (1606) comprise 

- a field description field (1607) describing the member variable that is being 
observed, 

- a lock field (1608) for obtaining exclusive access to the fifth data stmcture 
(1606) , 

- a read list address field (1609) for containing the address of a doubly linked 
list of read information structures (1611) used to record information on 
relevant read operations 

- a write list address field (1610) for containing the address of a write 
information data stnicture (1618) used to record iirf^ 

write operations. 

17. - Method according to claim 16, y/herejin the read information structure (1611) 

describes the most recent, read operation performed for each separate thread on 
the corresponding member variable. 

18. - Method according to any of claims 16 or 17, wherein the write information data 
^ - structure (1618) describes the most recent write operation performed on the "B^-^^. 

corresponding member variable, 

19. - A computer implemented method for determining the order of events in the 

presence of a dynamically changing number of threads of a computer program 
executable on a computer, wherein a clock data structure (601) is maintained in 
memory froni which clock data structure the occurrence of two events in parallel 
during execution of the, threads caii be? determined, the dimension of the clock 
. . data structure (601) being determined dynamically dependent upon the number 
of threads created and destroyed during execution of the program, 

20. - Method according to claim 19, wherein the dock data structure (601) comprises 

a lock (602) to be taken by a thread if exclusive access to the clock is required, 
, , and the address (603) of a local clock data structure (604). . 
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21. - Method according to claim 20, wherein the local clock data structure (604) 

comprises 

- a lock field (605) to be locked by a thread for obtaining exclusive access to 
the local clock data structure (604), 

5 - a count field (606) indicating the number of different vector clocks that use 

the local clock data structure (604), 

- an array of values (607) containing the values of he local clock data structure 
(604), 

- a next field (608) and a previous field (609) to link the local clock data 
10 structure (604) in a doubly linked circular list (610). 

22. - Use of the method according to any of the claims 1 to 21, whereby the method is 

implemented in association with a virtual machine. 

23. - The use according to any of the claims 1 to 21, wherein the method is 

implemented in an interpreter. 

15 24 - Use of the method of any of claims 1 to 21, whereby the method is implemented 
in a compiler. 

25. - Use of the method according to any of claims 1 to 21, whereby the method is 

implemented in hardware. 

26. - Use of the method according to any of claims 1 to 21, whereby the method is 
20 implemented in a garbage collector. 

27. - A computer readable data carrier comprising a computer executable computer 

programming product for executing any of the methods of claims 1 to 21 on a 
computer. 
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• ; .:'af ^ -Abstract . 

Topological, on-the-fly classification of objects into a global set and 

local sets 

The present invention relates to concurrently executing program threads in computer 
systems, an more particularly to detecting data races, 

A computer implemented method for detecting data races in the execution of multi- 
. threaded, strictly object oriented programs is provided, whereby objects on a heap 
are classified in a set of global objects, containing objects that can be reached by 
more than one thread, and sets of local objects, containing objects that can only be 
reached by one thread. Only the set of global objects is observed for determining 
occurrence of data races. 
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